Data Integrity in its broadest meaning refers to the trustworthiness of system resources over their entire life cycle. In more analytic terms, it is "the representational faithfulness of information to the true state of the object that the information represents, where representational faithfulness is composed of four essential qualities or core attributes: completeness, currency/timeliness, accuracy/correctness and validity/authorization.The concept of business rules is already widely used nowadays and is subdivided into six categories which include data rules. Data is further subdivided Data Integrity Rules, data sourcing rules, data extraction rules, data transformation rules and data deployment rules.
Data Integrity is very important in database operations in particular and Data Warehousing and Business Intelligence in general. Because Data Integrity ensured that data is of high quality, correct, consistent and accessible, in is important to follow rules governing Data Integrity.
A Data Value Rule or Conditional Data Value Rule specifies data domains. The difference between the two is that the former specifies the domain of allowable values for a data attribute which applies to all situation while the latter does not apply to all situations but only when there exceptions or certain conditions that applies.
Data Structure Rule defines that cardinality of data for a data relation in cases where there are no conditions of exceptions which apply. This rule makes data structure very easy to understand. A conditional data structure rule is slightly different in that is governs when conditions or exceptions apply on data cardinality for a data relation.
A Data Derivation Rule specifies the how a data value is derived based on algorithm, contributors and conditions. It also specifies the conditions on how the data value could be re-derived.
A Data Retention Rule specifies the length of time of data values which can be retained in a particular database. It is specifies what can be done with data values when its use for a database expires A data occurrence retention rule specifies the length of time the data occurrence is retained and what can be done with data when it is no longer useful. A data attribute retention rule is similar to a data retention rule but the data attribute retention rule only applies to specific data values rather than the entire data occurrence.
These Data Integrity Rules, like any other rules, are totally without meaning when they are not implemented and enforced.
In order to achieve Data Integrity, these rules should be consistently and routinely applied to all data which are entering the Data Warehouse or any Data Resource for that matter. There should be no waivers or exceptions for the enforcement of these rules because any slight relaxation of enforcement could mean a tremendous error result.
As much as possible, these Data Integrity Rules must be implemented in as close to the initial capture of data so that early detection and correction of potential breach of integrity can be taken action. This can greatly prevent errors and inconsistencies from entering the database.
With strict implementation and enforcement of these Data Integrity Rules, data error rates could be much lower so less time is spent on trying to troubleshoot and trace faulty computing results. This translates to savings from manpower expense.
Since there is low error rate, there can only be high quality data that can be had to provide better support in the statistical analysis, trend and pattern spotting, and decision making tasks of a company. In today's digital age, information one major key to success and having the right information means having better edge over the competitors.
" [1]
Most narrowly, data with integrity has a complete or whole structure. All characteristics of the data including business rules, rules for how pieces of data relate, dates, definitions and lineage must be correct for data to be complete.
Per the discipline of data architecture, when functions are performed on the data the functions must ensure integrity. Examples of functions are transforming the data, storing the history, storing the definitions (Metadata) and storing the lineage of the data as it moves from one place to another. The most important aspect of data integrity per the data architecture discipline is to expose the data, the functions and the data's characteristics.
Data that has integrity is identically maintained during any operation (such as transfer, storage or retrieval). Put simply in business terms, data integrity is the assurance that data is consistent, certified and can be reconciled.
In terms of a database data integrity refers to the process of ensuring that a database remains an accurate reflection of the universe of discourse it is modelling or representing. In other words there is a close correspondence between the facts stored in the database and the real world it models.[2]
Data integrity is normally enforced in a database system by a series of integrity constraints or rules. Three types of integrity constraints are an inherent part of the relational data model: entity integrity, referential integrity and domain integrity.
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.
Referential integrity concerns the concept of a foreign key. The referential integrity rule states that any foreign key value can only be in one of two states. The usual state of affairs is that the foreign key value refers to a primary key value of some table in the database. Occasionally, and this will depend on the rules of the business, a foreign key value can be null. In this case we are explicitly saying that either there is no relationship between the objects represented in the database or that this relationship is unknown.
Domain integrity specifies that all columns in relational database must be declared upon a defined domain. The primary unit of data in the relational data model is the data item. Such data items are said to be non-decomposable or atomic. A domain is a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.
If a database supports these features it is the responsibility of the database to insure data integrity as well as the consistency model for the data storage and retrieval. If a database does not support these features it is the responsibility of the application to insure data integrity while the database supports the consistency model for the data storage and retrieval.
Having a single, well controlled, and well defined data integrity system increases stability (one centralized system performs all data integrity operations), performance (all data integrity operations are performed in the same tier as the consistency model), re-usability (all applications benefit from a single centralized data integrity system), and maintainability (one centralized system for all data integrity administration).
Today, since all modern databases support these features (see Comparison of relational database management systems), it has become the defacto responsibility of the database to insure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets, ISAM, flat files, etc.) for their consistency model lack any kind of data integrity model. This requires companies to invest a large amount of time, money, and personnel in the creation of data integrity systems on a per application basis that effectively just duplicate the existing data integrity systems found in modern databases. Many companies, and indeed many database systems themselves, offer products and services to migrate out-dated and legacy systems to modern databases to provide these data integrity features. This offers companies a substantial savings in time, money, and resources because they do not have to develop per application data integrity systems that must be re-factored each time business requirements change.
An example of a data integrity mechanism is the parent and child relationship of related records. If a parent record owns one or more related child records all of the referential integrity processes are handled by the database itself, which automatically insures the accuracy and integrity of the data so that no child record can exist without a parent (also called being orphaned) and that no parent loses their child records. It also ensures that no parent record can be deleted while the parent record owns any child records. All of this is handled at the database level and does not require coding integrity checks into each applications.